Fix the displayed loss in the sft trainer for gradient accumulation > 1 #102

hiyouga · 2025-01-13T18:25:08Z

What does this PR do?

This PR mainly fixes the display loss value in the fsdp_sft_trainer. We should sum up the scaled values rather than directly use the scaled one. By the way, we also performed an all reduce operation to average the loss values among DP nodes.

Environment

Python version: 3.10.12
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.47.1
GPU type: NVIDIA H100
vLLM version: 0.6.6.post1

Reproduction

torchrun --standalone --nnodes=1 --nproc_per_node=4 \
     -m verl.trainer.fsdp_sft_trainer \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.prompt_key=prompt \
    data.response_key=answer \
    data.micro_batch_size=8 \
    model.partial_pretrain=meta-llama/Meta-Llama-3-8B-Instruct \
    trainer.project_name=gsm8k-sft \
    trainer.experiment_name=llama3_gsk8k_before_fix \
    trainer.total_epochs=4 \
    trainer.logger=['console','wandb']

The training loss of Llama-3 8B on the GSM8K dataset should be larger than 0.5, while the bugged version has much lower loss values.

Who can review?

@PeterSH6

Fix loss value for gradient accumulation > 1

421315d

vermouth1992 approved these changes Jan 14, 2025

View reviewed changes

vermouth1992 merged commit e230de8 into volcengine:main Jan 14, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the displayed loss in the sft trainer for gradient accumulation > 1 #102

Fix the displayed loss in the sft trainer for gradient accumulation > 1 #102

hiyouga commented Jan 13, 2025 •

edited

Loading

Fix the displayed loss in the sft trainer for gradient accumulation > 1 #102

Fix the displayed loss in the sft trainer for gradient accumulation > 1 #102

Conversation

hiyouga commented Jan 13, 2025 • edited Loading

What does this PR do?

Environment

Reproduction

Who can review?

hiyouga commented Jan 13, 2025 •

edited

Loading